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ABSTRACT 

In  this  report  the  Computer  Vision  System  toolbox  of  MATLAB®  and  the  Visual  Structure 
from  Motion  software  are  evaluated  on  three  datasets  of  airborne  imagery,  provided  by  Defence 
Research  and  Development  Canada,  and  on  a  dataset  of  synthetic  imagery  generated  by  the 
VIRSuite  software  developed  in  the  Defence  Science  and  Technology  Group.  The  user  interface 
for  a  developed  code  for  structure  reconstruction,  based  on  the  feature  detection  algorithms 
of  the  MATLAB  toolbox,  is  described. 
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Evaluation  of  Available  Software  for  Reconstruction 
of  a  Structure  from  its  Imagery 

Executive  Summary 

The  objectives  of  the  “Three-Dimensional  Target  Reconstruction”  International  Collaboration 
Project  (CP  7-25-13)  between  Australia,  Canada  and  USA,  are  to  extract  three-dimensional 
information  from  a  target  via  low  to  high  cost  vision  sensors  on  moving  ground  or  air  platforms 
through  algorithm  development.  Methods  of  analysis  include  comparison  of  three-dimensional 
reconstruction  algorithms  to  ‘truth’  data  gathered  via  Laser  Detection  and  Ranging  (LADAR) 
sensors  including  inertial  sensors. 

Existing  techniques  allow  reliable  reconstruction  of  smooth  surfaces,  such  as  terrains,  but  the 
reconstruction  of  three-dimensional  buildings  is  very  challenging  with  automated  software. 
This  project  is  aimed  at  investigating  new  techniques  to  improve  robustness  of  the  recon¬ 
struction  process,  and  also  looks  at  the  issues  of  using  LADAR  data  for  direct  reconstruction. 
LADAR  data  is  often  sparse,  particularly  at  longer  ranges,  and  the  fusion  of  two-dimensional 
imagery  can  be  beneficial.  The  expected  defence  outcomes  from  the  project  are: 

•  the  development  of  robust  techniques  for  mission  planning  when  using  imaging  seekers 

•  collaboration  on  algorithmic  techniques  for  automatic  target  reconstruction,  detection 
and  recognition 

•  improved  simulation  techniques  and  sharing  of  test  data  sets. 

This  report  partially  addresses  the  objectives  of  the  collaboration  project.  The  Computer 
Vision  System  toolbox  of  MATLAB®  and  the  Visual  Structure  from  Motion  (VisualSFM) 
software  are  evaluated  on  three  datasets  of  airborne  imagery,  provided  by  Defence  Research 
and  Development  Canada,  and  on  a  dataset  of  synthetic  imagery  generated  by  the  VIRSuite 
software  developed  in  the  Defence  Science  and  Technology  Group.  The  user  interface  for  a 
developed  code  for  structure  reconstruction,  based  on  the  feature  detection  algorithms  of  the 
MATLAB  toolbox,  is  described. 
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1  Introduction 

The  objectives  of  the  “Three-Dimensional  Target  Reconstruction”  International  Collaboration 
Project  (CP  7-25-13)  between  Australia,  Canada  and  USA,  are  to  extract  three-dimensional 
information  from  a  target  via  low  to  high  cost  vision  sensors  on  moving  ground  or  air  platforms 
through  algorithm  development.  Methods  of  analysis  include  comparison  of  three-dimensional 
reconstruction  algorithms  to  ‘truth’  data  gathered  via  Laser  Detection  and  Ranging  (LADAR) 
sensors  including  inertial  sensors. 

Existing  techniques  allow  reliable  reconstruction  of  smooth  surfaces,  such  as  terrains,  but  the 
reconstruction  of  three-dimensional  buildings  is  very  challenging  with  automated  software. 
This  project  is  aimed  at  investigating  new  techniques  to  improve  robustness  of  the  recon¬ 
struction  process,  and  also  looks  at  the  issues  of  using  LADAR  data  for  direct  reconstruction. 
LADAR  data  is  often  sparse,  particularly  at  longer  ranges,  and  the  fusion  of  two-dimensional 
imagery  can  be  beneficial.  The  expected  defence  outcomes  from  the  project  are: 

•  the  development  of  robust  techniques  for  mission  planning  when  using  imaging  seekers 

•  collaboration  on  algorithmic  techniques  for  automatic  target  reconstruction,  detection 
and  recognition 

•  improved  simulation  techniques  and  sharing  of  test  data  sets. 

A  series  of  reports  have  been  published  [Antanovskii  2014,  Antanovskii  20166,  Antanovskii 
2016a,  Antanovskii  20176,  Antanovskii  2017a]  which  partially  address  the  objectives  of  the 
collaboration  project.  In  particular,  a  prototype  MATLAB  code  for  3D  target  reconstruction 
is  developed,  which  implements  the  basic  3D  reconstruction  algorithms  described  in  [Hartley 
Zisserman  2003].  In  this  follow-up  report  the  Computer  Vision  System  toolbox  of  MATLAB® 
and  the  Visual  Structure  from  Motion  (VisualSFM)  software  are  evaluated  on  three  datasets 
of  airborne  imagery,  provided  by  Defence  Research  and  Development  Canada  (DRDC),  and  on 
a  dataset  of  synthetic  imagery  generated  by  the  VIRSuite  software  developed  in  the  Defence 
Science  and  Technology  Group  [Swierkowski  et  al.  2014].  The  user  interface  for  a  developed 
code  for  structure  reconstruction,  based  on  the  feature  detection  algorithms  of  the  MATLAB 
toolbox,  is  described. 


2  Overview  of  feature  detection  algorithms 

Comprehensive  background  material  and  basic  numerical  methods  for  structure  reconstruction 
are  provided  in  [Hartley  Sz  Zisserman  2003].  It  is  worthwhile  emphasizing  that  this  mono¬ 
graph  does  not  address  the  feature  detection  technology,  which  constitutes  a  broad  subject  of 
research  in  its  own,  and  in  most  situations  assumes  that  all  world  points  of  interest  are  visible 
in  all  views.  In  real-world  scenarios  this  is  not  the  case  due  to  point  occlusion. 

Feature  detection  is  a  low-level  image  processing  operation  based  on  examining  every  pixel 
of  an  image  and  its  immediate  neighbourhood  to  associate  feature  descriptors  to  points  of 
interest,  called  keypoints ;  the  extracted  descriptors  are  then  used  to  match  the  keypoints  in 
any  two  images.  Several  feature  detection  algorithms  are  publicly  available,  namely: 

•  Combined  Corner  and  Edge  Detector  algorithm  [Harris  Sz  Stephens  1988] 
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•  Minimum  Eigenvalue  algorithm  [Shi  &  Tornasi  1994] 

•  Scale-Invariant  Feature  Transform  algorithm  [Lowe  1999,  Lowe  2004a] 

•  Maximally  Stable  Extremal  Regions  algorithm  [Matas  et  al.  2002,  Mikolajczyk  et  al. 
2005,  Nister  &;  Stewenius  2008,  Obdrzalek  et  al.  2009] 

•  Features  from  Accelerated  Segment  Test  algorithm  [Rosten  &;  Drummond  2005] 

•  Speeded-Up  Robust  Features  algorithm  [Bay,  Tuytelaars  &;  Van  Gool  2006,  Bay  et  al. 
2008,  Bradski  &;  Kaehler  2008] 

•  Binary  Robust  Invariant  Scalable  Keypoints  algorithm  [Leutenegger,  Chli  &  Siegwart 

2011]. 

The  above  feature  detection,  extraction  and  matching  algorithms,  except  for  the  patented 
Scale- Invariant  Feature  Transform  (SIFT)  algorithm  [Lowe  20046],  are  implemented  in  the 
Computer  Vision  System  and  Image  Processing  toolboxes  of  MATLAB® . 


3  Imagery  datasets 

Three  datasets  of  airborne  imagery  were  provided  by  DRDC  for  the  purpose  of  software 
evaluation  and  benchmarking. 


Figure  1:  Typical  image  of  Dataset  A 


The  first  dataset  (“Dataset  A”)  contains  302  mid- wavelength  infra-red  (MWIR)  airborne 
images  of  a  scene  in  the  Ottawa  (Canada)  area.  A  meta-data  spreadsheet  with  poses  of 
cameras  and  a  high-resolution  Computer-Aided  Design  (CAD)  model  of  the  building  obtained 
by  terrestrial  non-contact  3D  imaging  systems  (laser  scanner)  were  also  provided  by  DRDC 
[Beraldin  &;  Coumoyer  2014].  The  images  were  480-by-640  pixels  in  size.  A  typical  MWIR 
image  is  shown  in  Figure  1. 
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Figure  2:  Typical  image  of  Dataset  B 


The  second  dataset  ( “Dataset  B” )  contains  71  airborne  images  of  a  scene  containing  a  church. 
The  texture  of  the  buildings  was  altered,  but  the  rest  of  the  scene  was  intact.  A  meta-data 
spreadsheet  with  poses  of  cameras  was  also  provided  by  DRDC.  The  images  were  1280-by-720 
pixels  in  size.  One  of  them  is  shown  in  Figure  2. 


I 


I 
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Figure  3:  Typical  image  of  Dataset  C 


The  third  dataset  (“Dataset  C”)  contains  50  airborne  images  of  a  scene  containing  a  small 
farm.  The  images  were  1000-by-1000  pixels  in  size.  One  of  them  is  shown  in  Figure  3. 


In  addition  to  the  real-world  imagery  provided  by  DRDC,  the  fourth  dataset  (“Dataset  D”) 
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Figure  4'-  Typical  image  of  Dataset  D 


was  evaluated,  which  consisted  of  18  synthetic  MWIR  images  generated  by  VIRSuite.  This 
software  was  developed  in  the  DST  Group  [Swierkowski  et  al.  2014].  The  synthetic  images 
were  1024-by-1024  pixels  in  size.  One  of  them  is  shown  in  Figure  4. 


4  Evaluation  of  the  MATLAB  toolbox 

The  latest  releases  of  the  Image  Processing  and  Computer  Vision  System  toolboxes  support 
the  above-mentioned  feature  detection,  extraction  and  matching  algorithms  except  for  the 
SIFT  algorithm.  These  algorithm  implementations  with  default  settings  were  evaluated  on 
the  four  imagery  datasets.  All  the  images  in  each  dataset  were  processed,  and  the  average 
numbers  of  detected  keypoints  and  putative  matches  calculated.  The  results  are  summarized 
in  Tables  1-4,  respectively. 

The  Minimum,  Eigenvalue  algorithm  detected  the  greatest  number  of  features  on  average. 
However,  the  ratios  of  the  average  numbers  of  putative  matches  and  detected  keypoints  were 
considerably  smaller.  On  the  other  hand,  the  corresponding  ratios  obtained  by  the  Speeded-  Up 
Robust  Features  (SURF)  algorithm  were  quite  high,  though  the  average  numbers  of  detected 
keypoints  were  moderate.  This  indicates  that  the  SURF  descriptors  provide  better  identifi¬ 
cation  of  the  detected  features. 

The  keypoints  from  first  two  images  from  each  dataset,  detected  by  the  Minimum  Eigen¬ 
value  algorithm,  and  their  putative  matches  are  shown  in  Figures  A1-A4,  respectively.  It  is 
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Table  1:  Detected  keypoints  and  putative  matches  in  the  images  of  Dataset  A 


Feature  detection  algorithm 

Detected  keypoints 

Putative  matches 

Combined  Corner  and  Edge  Detector 

98 

63 

Minimum  Eigenvalue 

961 

556 

Maximally  Stable  Extremal  Regions 

419 

232 

Features  from  Accelerated  Segment  Test 

133 

50 

Speeded-Up  Robust  Features 

477 

359 

Binary  Robust  Invariant  Scalable  Keypoints 

141 

23 

Table  2:  Detected  keypoints  and  putative  matches  in  the  images  of  Dataset  B 


Feature  detection  algorithm 

Detected  keypoints 

Putative  matches 

Combined  Corner  and  Edge  Detector 

227 

60 

Minimum  Eigenvalue 

1918 

296 

Maximally  Stable  Extremal  Regions 

466 

143 

Features  from  Accelerated  Segment  Test 

272 

49 

Speeded-Up  Robust  Features 

713 

340 

Binary  Robust  Invariant  Scalable  Keypoints 

256 

19 

Table  3:  Detected  keypoints  and  putative  matches  in  the  images  of  Dataset  C 


Feature  detection  algorithm 

Detected  keypoints 

Putative  matches 

Combined  Corner  and  Edge  Detector 

1202 

191 

Minimum  Eigenvalue 

9608 

383 

Maximally  Stable  Extremal  Regions 

559 

156 

Features  from  Accelerated  Segment  Test 

984 

112 

Speeded-Up  Robust  Features 

991 

377 

Binary  Robust  Invariant  Scalable  Keypoints 

548 

22 

Table  4:  Detected  keypoints  and  putative  matches  in  the  images  of  Dataset  D 


Feature  detection  algorithm 

Detected  keypoints 

Putative  matches 

Combined  Corner  and  Edge  Detector 

1440 

6 

Minimum  Eigenvalue 

6776 

10 

Maximally  Stable  Extremal  Regions 

260 

12 

Features  from  Accelerated  Segment  Test 

134 

1 

Speeded-Up  Robust  Features 

142 

8 

Binary  Robust  Invariant  Scalable  Keypoints 

92 

0 

seen  that  the  number  of  matched  features  in  the  synthetic  images  is  very  small  as  compared 
with  the  number  of  detected  features.  The  variation  of  default  control  parameters,  though 
provided  more  putative  matches,  did  not  help  much  as  the  additional  matched  features  were 
outliers. 
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5  Evaluation  of  VisualSFM 


Several  methodologies  have  been  employed  for  the  evaluation  of  3D  target  reconstruction  soft¬ 
ware,  which  are  based  on  various  types  of  metrics  [Cavegn  et  al.  2014,  Nex  et  al.  2015,  Ballabeni 
et  al.  2015,  Nikolov  <k  Madsen  2016].  In  this  report  we  did  not  compare  reconstructed  objects 
with  ‘ground-truth’  data  in  detail,  so  only  visual  recognition  of  a  scene  from  the  reconstructed 
point  cloud  was  applied. 

VisualSFM  developed  by  Changchang  Wu  [Wu  2013,  Wu  et  al.  2011]  can  be  downloaded  from 
http://ccwu.me/vsfm/.  This  software  uses  the  SIFT  algorithm  to  detect  feature  points,  and 
its  performance  depends  on  the  Graphics  Processing  Unit  (GPU)  specification. 

VisualSFM  was  applied  to  the  images  of  Datasets  A-D.  The  reconstructed  point  clouds  and 
camera  poses  are  shown  in  Figures  A5-A8,  respectively.  Camera  poses  are  not  seen  well  in 
the  figures.  The  graphical  user  interface  of  VisualSFM  allows  the  user  to  scale,  rotate  and 
translate  the  reconstructed  scene.  In  general,  this  provides  a  means  to  visually  analyse  the 
obtained  model. 

The  visual  examination  of  the  obtained  point  clouds  demonstrated  that  the  scenes  were  re¬ 
constructed  fairly  well  from  the  airborne  images  of  Datasets  A-C.  Also,  the  number  of  the 
reconstructed  camera  poses  match  the  number  of  images.  However,  the  synthetic  MWIR 
images  did  not  provide  a  sufficient  number  of  points.  Moreover,  only  two  cameras  out  of  18 
were  reconstructed  by  VisualSFM. 

Though  future  enhancement  of  the  VisualSFM  software  is  expected  in  the  view  of  the  pub¬ 
lication  [Zheng  &  Wu  2015],  it  is  very  unlikely  that  the  source  code  will  be  released.  This 
motivated  us  to  develop  an  in-house  code  to  meet  future  requirements. 


6  Description  of  the  developed  MATLAB  code 


The  developed  MATLAB  code  implements  several  multi- view  reconstruction  algorithms  from 
[Hartley  &  Zisserman  2003],  and  currently  relies  on  the  Image  Processing  and  Computer 
Vision  System  toolboxes  of  MATLAB  to  detect,  extract  and  match  features. 

The  main  menu  of  the  code  is  shown  in  Figure  5.  A  dataset  of  imagery  has  to  be  loaded  first. 
Images  can  be  in  any  standard  format  supported  by  the  Image  Processing  toolbox.  If  camera 
poses  are  available,  the  user  can  create  an  XML  metadata  file  called  metadata. xml  file  whose 
sample  fragment  is  shown  below: 


<?xml  version=" 1 . 0"  encoding="utf-8"?> 
<metadata> 


<image  file=" 
<image  file=" 
<image  file=" 
<image  file=" 
<image  file=" 
<image  file=" 
<image  file=" 


imageOOl .jpg" 
image002 .jpg" 
image003 .jpg" 
image004 .jpg" 
image005 .jpg" 
image006 .jpg" 
image007 .jpg" 


x="303086"  y= 
x="303101 "  y= 
x="303105"  y= 
x="303103"  y= 
x="303097"  y= 
x="303089"  y= 
x="303078"  y= 


5 . 2002e+06" 

5 . 20027e+06" 
5 . 20035e+06" 
5 . 20042e+06" 
5 . 20049e+06" 
5 . 20056e+06" 
5 . 20063e+06" 


;="594 . 284"/> 
z="594 . 284"/> 
z="597 . 017"/> 
z="602 . 179"/> 
z="606 . 734"/> 
z="608 . 556"/> 
z="608 . 556"/> 
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Figure  5:  The  main  menu  of  the  MATLAB  application  for  target  reconstruction 
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<image  file= 
<image  file= 
<image  file= 
<image  file= 
<image  file= 
<image  file= 
<image  file= 
<image  file= 
<image  file= 
<image  file= 


"image008 . jpg" 
"image009 . jpg" 
"imageOlO . jpg" 
"imageOll . jpg" 
"image012 . jpg" 
"image013 . jpg" 
"image014 . jpg" 
"image015 . jpg" 
"image016 . jpg" 
"image017 . jpg" 


x="303064" 

x="303044" 

x="303018" 

x="302989" 

x="302955" 

x="302917" 

x="302874" 

x="302829" 

x="302779" 

x="302725" 


y="5 . 2007e+06"  z="610 . 074"/> 
y="5 . 20077e+06"  z="610 . 378"/> 
y="5 . 20083e+06"  z="608 . 556"/> 
y="5 . 2009e+06"  z="605 . 216"/> 
y="5 . 20095e+06"  z="600 . 964"/> 
y="5 . 20101e+06"  z="598 . 839"/> 
y="5 . 20107e+06"  z="597 . 624"/> 
y="5 . 20112e+06"  z="594.891"/> 
y="5 . 20116e+06"  z="594.284"/> 
y="5 . 20121e+06"  z="594. 588"/> 


</metadata> 

and  place  it  in  the  folder  where  the  dataset  images  are  located.  The  file  and  x,  y,  z  attributes 
of  each  <image>  element  must  specify  the  image  file  name  and  the  Cartesian  coordinates  of  the 
camera  corresponding  to  the  taken  image,  respectively.  This  optional  information  is  primarily 
used  for  bringing  the  reconstructed  model  to  the  reference  frame  of  the  cameras  for  better 
visualization,  therefore  camera  orientation  is  ignored.  Since  this  reference  frame  is  defined  by 
at  least  five  cameras  in  general  configuration,  the  user  will  be  prompted  to  load  the  metadata 
file  if  five  or  more  images  are  selected.  Then  the  loaded  images  can  be  optionally  resized  and 
their  contrast  enhanced. 

After  a  dataset  of  imagery  is  loaded  the  user  can  execute  the  following  reconstruction  steps: 

1  Detect  features 

2  Match  features 

3  Estimate  camera  graph 

4  Estimate  homography  graph 

5  Estimate  visibility  relation 

6  Run  bundle  adjustment. 

Relevant  solver  control  parameters  can  be  selected  at  each  step.  Visualisation  of  intermediate 
reconstruction  results  is  available,  and  each  step  can  be  repeated  with  adjusted  solver  control 
parameters.  Alternatively,  all  the  6  steps  can  be  executed  in  a  batch  mode  with  the  default 
solver  parameters. 

In  Step  1,  after  the  feature  detection  algorithm  is  specified,  all  images  are  processed  to  detect 
and  extract  features.  Then  the  detected  keypoints  can  be  viewed.  The  user  can  change  the 
feature  detection  algorithm  and  re-run  Step  1  if  required. 

In  Step  2  the  user  is  prompted  to  select  the  camera  graph  type,  which  specifies  which  image 
pairs  to  process,  and  then  prompted  to  accept  or  change  the  default  rejection  ratio,  a  param¬ 
eter  in  the  match  feature  algorithm  of  the  Computer  Vision  System  toolbox,  which  is  used  to 
decide  when  to  reject  ambitious  matches. 

In  Step  3  the  user  is  prompted  to  select  solver  control  parameters  for  the  estimation  of 
bifocal  tensors  for  each  edge  of  the  camera  graph.  Employing  the  implemented  RANSAC 
algorithm  [Fischler  Sz  Bolles  1981],  outliers  are  detected  for  each  image  pair  by  randomly 
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sampling  7-point  correspondences  and  estimating  the  bifocal  tensor  (the  fundamental  matrix) 
using  the  non-linear  algorithm  described  in  [Hartley  &  Zisserman  2003,  Page  281].  Then 
the  bifocal  tensor  is  re-estimated  from  the  remaining  inliers  using  the  normalized  8-point 
algorithm  [Hartley  Zisserman  2003,  Page  282],  Finally,  the  Gold  Standard  method  is  applied 
to  optimize  the  bifocal  tensor  by  minimizing  the  re-projection  error  objective  function  using 
the  implemented  Levenberg-Marquardt  solver  [Levenberg  1944,  Marquardt  1963].  In  the  end 
of  this  procedure  the  camera  graph  built  on  images  as  vertices  and  image  pairs  with  sufficient 
number  of  point  correspondences  as  edges  is  fully  estimated.  Some  edges  of  the  camera  graph 
are  deleted  if  an  insufficient  number  of  matched  inliers  are  detected.  After  Step  3  the  user 
can  check  the  refined  correspondences  and  camera  graph  co-cycle  conditions,  and  view  the 
matched  inliers.  The  refined  correspondences  between  matched  inliers  must  be  a  bijection  for 
each  image  pair,  the  edge  of  the  camera  graph,  and  the  camera  co-cycle  condition  must  be 
satisfied  for  each  graph  cycle  for  consistency  [Antanovskii  20176].  If  these  conditions  are  not 
met,  e.g.  due  to  excessive  noise,  the  user  has  to  change  the  solver  control  parameters,  such  as 
the  inlier  tolerance,  or  choose  another  feature  detection  algorithm. 

In  Step  4  the  user  is  prompted  to  select  solver  control  parameters  for  the  estimation  of  3D 
transition  homographies.  The  3D  homography  is  a  projective  transformation  of  the  3D  projec¬ 
tive  space  bringing  locally  reconstructed  world  points  for  each  image  pair  to  a  global  reference 
frame.  The  transition  homographies  are  parametrized  by  the  edges  of  the  homography  graph, 
which  is  a  subgraph  of  the  line  graph  [Harary  1972]  of  the  camera  graph  [Antanovskii  2016a]. 
The  RANSAC  solver  is  applied  to  estimate  the  3D  homography  by  randomly  sampling  5- 
point  correspondences,  followed  by  minimizing  the  re-projection  error  with  respect  to  the 
corresponding  local  camera  maps  using  the  Levenberg-Marquardt  solver.  If  the  homography 
graph  has  cycles,  the  co-cycle  conditions  have  to  be  satisfied  [Antanovskii  2016a].  However, 
in  the  presence  of  significant  noise  in  images,  the  transition  homographies  may  not  satisfy  the 
co-cycle  conditions  accurately  enough. 

In  Step  5  the  user  is  prompted  to  select  a  uniqueness  tolerance  for  the  estimation  of  the 
visibility  relation  [Antanovskii  20166].  First,  camera  maps  are  computed  in  a  global  coordinate 
system  from  the  transition  homographies  and  local  canonical  camera  matrices.  The  locally 
reconstructed  world  points  are  brought  to  the  global  coordinate  system  at  the  same  time. 
World  points,  reconstructed  from  different  views,  invariably  contain  duplicates  which  have  to 
be  eliminated.  The  uniqueness  tolerance  is  the  threshold  for  merging  potentially  duplicate 
points.  A  metric  between  two  3D  points  is  defined  as  the  maximum  of  the  distances  of 
projected  image  points  (e.g.  in  pixels)  averaged  over  a  set  of  cameras.  This  metric  may  not 
be  a  proper  distance,  because  it  can  vanish  for  different  points  when  the  set  of  cameras  is 
degenerate.  For  example,  this  situation  always  occurs  for  a  single  camera.  However,  in  the 
general  configuration  of  at  least  two  cameras,  this  metric  has  all  the  properties  of  a  distance. 
This  approach  is  promising  as  it  easily  copes  with  the  common  situation  when  a  3D  point 
disappears  from  a  view  and  then  re-appears  again.  After  Step  5  the  user  can  check  the  camera 
pose  reconstruction  error,  if  the  matadata  file  was  loaded,  check  the  re-projection  error,  view 
the  re-projected  keypoints  in  images,  view  the  world  points  with  or  without  camera  positions, 
and  view  the  camera  and  homography  graphs.  A  random  3D  homography  can  be  applied 
from  the  user  interface  to  change  the  projective  view.  There  are  15  degrees  of  freedom  for 
the  selection  of  an  appropriate  homography,  which  are  not  intuitive  as  opposed  to  object 
rotation.  So,  a  random  homography  is  easier  to  apply  for  the  visualization  purpose,  which 
may  eventually  make  the  reconstructed  scene  more  recognizable  after  a  few  attempts. 
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In  Step  6  the  user  is  prompted  to  select  solver  control  parameters  for  Bundle  Adjustment.  The 
whole  set  of  world  points  without  duplicates  and  camera  maps  is  optimized  by  minimizing 
the  global  re-projection  error  using  the  Levenberg-Marquardt  solver.  A  sparse  storage  for  the 
associated  Jacobian  matrices  is  used  in  the  implementation  of  the  bundle  adjustment.  After 
Step  6  the  pattern  of  the  sparse  Jacobian  matrix  can  be  visualized. 


7  Discussion 


Four  imagery  datasets  were  evaluated  with  the  Computer  Vision  System  and  Image  Processing 
toolboxes  of  MATLAB,  and  with  VisualSFM.  It  was  revealed  that  the  real-world  imagery 
provided  reasonably  good  reconstruction  of  the  structures  and  camera  poses,  whereas  the 
synthetic  images  generated  by  VIRSuite  performed  poorly.  The  latter  issue  will  be  addressed 
in  the  future. 

The  user  interface  of  developed  MATLAB  code  has  been  described,  which  is  currently  based 
on  the  toolboxes  of  MATLAB  [Antanovskii  2016a,  Antanovskii  20176].  The  code  is  thoroughly 
tested  in  unit  tests  and  partially  validated  against  the  imagery  datasets.  The  integration  of 
the  implemented  SIFT  algorithm  [Antanovskii  2017a]  with  the  MATLAB  code  will  be  the 
subject  of  a  separate  publication.  Another  important  extension  of  the  code  to  be  addressed 
in  the  future  is  its  integration  with  LADAR  sensing  data. 
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Appendix  A:  Simulation  results 


(i)  Detected  features  in  Image  1 


(ii)  Detected  features  in  Image  2 


Putative  matches  (331)  in  images  (1,  2) 


(Hi)  Putative  matches 

Figure  Al:  Detected  and  matched  features  in  two  images  of  Dataset  A 
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(i)  Detected  features  in  Image  1 


(ii)  Detected  features  in  Image  2 


Putative  matches  (300)  in  images  (1,2) 


(in)  Putative  matches 

Figure  A2:  Detected  and  matched  features  in  two  images  of  Dataset  B 
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(i)  Detected  features  in  Image  1 


(ii)  Detected  features  in  Image  2 


Putative  matches  (293)  in  images  (1,  2) 


(in)  Putative  matches 


Figure  A3: 


Detected  and  matched  features  in  two  images  of  Dataset  C 
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Detected  keypoints  (S145)  in  image  1  [Minimum  Eigenvalue  algorithm]  Detected  keypolnts  (5661)  in  image  2  [Minimum  Eigenvalue  algorithm] 


(i)  Detected  features  in  Image  1  (ii)  Detected  features  in  Image  2 


(Hi)  Putative  matches 


Figure  Af: 


Detected  and  matched  features  in  two  images  of  Dataset  D 


16 


UNCLASSIFIED 


UNCLASSIFIED 


DST-Group-TR-3356 


Figure  A5:  Point  cloud  and  camera  poses  reconstructed  from  302  images  of  Dataset  A 


Figure  A6:  Point  cloud  and  camera  poses  reconstructed  from  71  images  of  Dataset  B 
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Figure  A7:  Point  cloud  and  camera  poses  reconstructed  from  50  images  of  Dataset  C 


s? 


Figure  A8:  Point  cloud  and  camera  poses  reconstructed  from  18  images  of  Dataset  D 
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